A model for the noninvasive, habitat-inclusive estimation of upper limit abundance for synanthropes, exemplified by M. fascicularis

Accurately estimating population sizes for free-ranging animals through noninvasive methods, such as camera trap images, remains particularly limited by small datasets. To overcome this, we developed a flexible model for estimating upper limit populations and exemplified it by studying a group-living synanthrope, the long-tailed macaque (Macaca fascicularis). Habitat preference maps, based on environmental and GPS data, were generated with a maximum entropy model and combined with data obtained from camera traps, line transect distance sampling, and direct sightings to produce an expected number of individuals. The mapping between habitat preference and number of individuals was optimized through a tunable parameter ρ (inquisitiveness) that accounts for repeated observations of individuals. Benchmarking against published data highlights the high accuracy of the model. Overall, this approach combines citizen science with scientific observations and reveals the long-tailed macaque populations to be (up to 80%) smaller than expected. The model’s flexibility makes it suitable for many species, providing a scalable, noninvasive tool for wildlife conservation.


INTRODUCTION
The accurate estimation of animal population sizes across large areas from camera trap data requires careful corrections to account for the lack of individual identification, the small extent of surveyed areas, and the limited size of datasets (1).It proves most effective for the study of large mammals (2), with extrapolations to larger regions from camera trap data alone intensifying the trade-off between dataset size and model accuracy.Tracking the movement of individually marked animals via mark-recapture or GPS tagging offers a solution but unavoidably disrupts the species' movement and behavior and requires substantial amounts of scientific resources and trained labor (3).
Currently, substantial strides are being made to circumvent these limitations (4), such as the integration of Bayesian computation and spatial-temporal information in population modeling, which offer ways to reduce bias across large bodies of data (5).When applied carefully to citizen science (6) or presence-only data (7), these treatments effectively enrich the total data available on a species for abundance estimation.Still, for species in which data are relatively scarce, such integration is not yet sufficient and the need remains for alternative approaches for assessing approximate population sizes from highly constrained datasets.
Without proper individual identification, the limited resolution of camera traps increases the likelihood of counting the same individual as two or more distinct individuals (double counting); even as machine learning models become more refined, differences in luminosity, weather, and camera angle can lead to unreliable results (8).These limitations are exacerbated by the small areas of separately surveyed regions (9), creating small and likely biased datasets that are generally not suitable for country-wide extrapolations.While recent methods attempt to overcome the limitations of individual identification, like the random encounter and staying time (REST) model (10), these approaches continue to fall short for estimates across larger regions if habitat diversity is not explicitly included in the analysis or if the distribution of camera traps is not random.
Camera trap surveys, drone surveys, and the use of citizen science data provide a cost-and labor-efficient way of surveying larger areas and producing GPS locations that can be used for population estimation.Distance sampling has long been a favorable method for estimating primate density and abundance but remains highly labor intensive and requires in-depth knowledge of the region, limiting its application to small areas (11,12).Furthermore, several assumptions, such as randomization and replication, must be met for reliable population size inference (13).
Many current efforts of noninvasive sampling focus either on arboreal species, by infrared drone imaging, or on solitary species, like most camera trap efforts, making camera traps alone less efficient in characterizing group-living synanthropes when the size of the datasets is small.Although Bayesian modeling can offer an effective solution to increase datasets by combining different data sources (14), this is most reliable when the region investigated is constant for the different methods of data collection (15).As human-wildlife interfaces increase, more animal species are bound to exhibit synanthropic patterns, highlighting the need for tools to effectively estimate their populations sizes accurately.
In this study, we developed a noninvasive probability model to estimate the maximum population abundance of large-bodied semiterrestrial/terrestrial group-living animals.Probabilities for group sizes are assigned according to the number of individual animals observed using a log-normal distribution combined with a tunable variable (16).The model was specifically applied to the long-tailed macaque (Macaca fascicularis), a nonhuman primate in need of an updated population size estimate (17,18).Starting from a firstprinciples description of animal behavior, a probability model was empirically corrected by camera trap and sighting data, as well as information on species behavior and habitat preference (19).This involved combining a probability description with the explicit inclusion of variable parameters (e.g., species-species demography and habitat preference), aiming to accurately replicate the population abundance for fully mapped areas (20).This offers a flexible and transparent method to estimate maximum population abundance across larger regions: By starting from a first-principles analysis and openly incorporating species behavior-along with its inherent uncertainty and regional intraspecific variation-the model relies on a small set of variables that may be tailored to different species or particular regions.
The long-tailed macaque is widely known for its synanthropic behavior, wide range, and perceived ubiquity, often resulting in the assumption of overabundance (21,22).However, only a few attempts have been made to accurately estimate its population size (23)(24)(25).M. fascicularis is distributed from the northern coastal areas of the Rakhine including the southeastern coast of Bangladesh to the southern reaches of Indonesia and the eastern Philippines islands (23), although recent findings indicate extirpation in Bangladesh and in local areas of Cambodia and Laos (26).This generalist primate displays synanthropic versatility across various habitats, including forests, grasslands, coastal areas, agricultural lands, urban areas, and temples, and is able to use human resources (24,27).Despite substantial overlap between long-tailed macaques and humans and subsequent assertions of overabundance, recent research suggests that the long-tailed macaque population is undergoing a persistent decline.This is partially due to the overlap with humans and thereby visibility and vulnerability to threats, such as hunting and culling, and partially due to the demand for the species for the pet trade, meat trade, entertainment, and trade for biomedical and pharmaceutical research (17,18,22,26).The amount and severity of threats led to the 2022 IUCN Red List listing of the species as Endangered A3cd (18).The IUCN Red List classification of M. fascicularis is currently being contested (18,28).This and the factors outlined above make it an interesting test subject for our model, enabling us to simultaneously test the model and provide crucial information for the conservation of the species.

Habitat preference
The area under the receiver operating characteristic curve (AUC) value for the test data was 0.854, while the AUC value for the training data was 0.870, showing good model performance (fig.S1).The omission rate of the test and training data was essentially consistent with the predicted omission rate, indicating the absence of spatial autocorrelation and a good modeling effect (fig.S2) (29).The combined evaluation of AUC values and omission rates suggested that the model predictions have high accuracy and reliability.Figure 1 shows the habitat preference map for the long-tailed macaque with all environmental variables and suitability, values ranging from 0 to 1.

Maximum abundance estimation
The habitat preference-population curves primarily exhibit an increasing trend, with the curve that replicates the best estimation for the population at Keo Seima (ρ sanctuary ) (20) located between the lower and upper bounds, ρ lower and ρ upper , respectively (Fig. 2).Likewise, these curves approximate scaled versions of each other, in alignment with the expected influence of inquisitiveness ρ on abundance estimations.
When applied to the habitat preference map of Keo Seima Wildlife Sanctuary (KSWS), these curves correctly reproduce the reported populations of 854, 1566, and 3097 for lower, best, and upper ρ estimates, respectively.When applied to the habitat preference map of Cát Tiên National Park (CTNP)-whose observation data were not directly used in any of the models-the estimations closely match the predictions from direct observations: 255, 509, and 1009 for lower, best, and upper ρ values, respectively.
Considering that each spot in the model has a fixed area of 1.143 km 2 , the average of each habitat preference-population curve (1.052, 0.530, and 0.267 individuals per spot) can be used to approximate an average density of long-tailed macaque groups across large regions, namely, 0.921, 0.464, and 0.233 groups/km 2 for each increasing value of ρ.These results agree remarkably well with reported group densities in the literature, ranging from 0.6 to 1.3 groups/km 2 (30).
Upon extrapolation to the entire habitat preference map (wherein not all regions are protected areas), these curves produce a best estimate, as well as the range extremes, for the upper limit population of long-tailed macaque individuals in each country and across the entire region (Table 1).The model draws most data from Cambodia, Laos, Malaysia, and Vietnam and was parameterized against KSWS (within Cambodia) and was tested against CTNP (within Vietnam).Therefore, predictions are expected to be most accurate within these countries ("high confidence countries"; fig.S3A).The model's generality allows extrapolation to other Southeast Asian countries, albeit with lower accuracy ("lower confidence countries"; fig.S3B).

DISCUSSION
Our aim was to develop a noninvasive probability model to estimate the maximum population abundance of large-bodied semiterrestrial/ terrestrial group-living animals, hoping to overcome current problematics surrounding abundance estimation from camera trap and direct sighting data.This method for estimating the upper limit abundance of a species' population is both flexible and robust: It eases the inclusion of species-specific data to accommodate a growing body of knowledge while relying on benchmarked results to ensure model accuracy.The model is not strictly limited to a single species.Its parameters, explicitly derived from information on the species, can be adjusted to approximate a wide range of wildlife populations in a noninvasive manner.It is important to note that the calibration of this application of the model currently relies on a single site, KSWS, Cambodia.Biases in the population estimation for this region (31) are bound to be magnified in the extrapolation.Nevertheless, most biases tend to contribute to the underestimation of densities, with a smaller number of observed groups and individuals (11), thereby reiterating the upper limit estimation of this approach.Our approach includes different data collection methods, and this can lead to unknown errors and difficulties with comparisons over time (12); however, given the difficulties of obtaining data with time, labor, and funding constraints, one can argue that it is crucial to use the data available to develop initial assessments and our method provides this opportunity.Our research provides an important perspective on the population trends of the long-tailed macaque and illustrates the applicability of our model.In this assessment, we reasonably estimated the population size and group density for a social, primarily grounddwelling synanthrope.We outlined a noninvasive tool for conservation efforts of underrepresented species.
Fooden's assessments of the long-tailed macaque population size from 1990 to 2006 indicated a decrease from 5 million to 3 million individuals (25,32).Our current estimate of approximately 1 million reflects a continuous decline representing an alarming 80% reduction over approximately 35 years (33).The severity of this decline is further emphasized by the nature of the model, which overestimates the population due to its calibration in a protected area, making the true decline possibly greater.Despite the methodological differences preventing direct comparison across all regions and sites, this decline raises concerns.While the numbers of long-tailed macaques might increase locally at points across their range, especially at interfaces with human settlements (34), our analysis suggests that these local increases do not reflect overall trends across Southeast Asia.The perception of long-tailed macaque ubiquity likely results from a selection bias, whereby degraded habitats force group migration closer to human settlement, where the species is more likely to be observed and reported.
Despite this model's relative success in providing reasonable estimates for long-tailed macaque populations, discrepancies may exist on the ground due to a variety of factors.For instance, while our estimates for Laos are much higher than the 300 to 600 individuals reported by ground observations (35) (P.Phiapalath, personal observation), the lack of protection from hunting for the species implies that even areas deemed suitable habitats may not accurately reflect the true population numbers.Furthermore, given the substantial and rapid decline we report here, reductions may occur faster than our data collection efforts, despite our use of recently collected data (2020-2022).On the other hand, a 2015 report from Cambodia, based on data from 2009, estimated 3 million freeranging long-tailed macaques (36), and if this is accurate, it indicates a drastic 97.5% decline over 14 years when compared to our current findings.Although the 3 million population size estimate by the Cambodian government does not correlate well with Fooden's estimates of the same number for the entire species' range, a decline in population size is evident in Cambodia.Considering that we provide an upper limit estimate for Cambodia, we are concerned that the actual population size may be even lower.
Our results for Malaysia indicate a lower estimate compared to a previous survey in 2011, where the population size of mainland Malaysia was estimated at 127,050 individuals (37).Our estimate for entire Malaysia is 119,499, suggesting a reduction.However, we cannot estimate the percentage decline as no population size for Bornean Malaysia was provided in 2011.In Malaysia alone, 493,823 individual long-tailed macaques were reported culled from 2011 to 2018, primarily due to assessments categorizing them as conflict macaques (38).Should the reported culling amount be accurate, the population has fluctuated over time and now experienced a substantial reduction warranting a reevaluation of culling amounts and strategies.
Our estimate for Vietnam is the first nationwide estimation for the country and could be used as a baseline for future assessments and to inform conservation efforts.However, we caution against relying exclusively on this upper limit estimate as ground surveys suggest that the true population size might be lower (T.V. Bang, personal observation).Overall, we provided lower confidence estimates for Indonesia, Myanmar, Philippines, Thailand, and Singapore.Nevertheless, the model remains able to estimate population sizes across areas where direct data are not available.Hence, even for so-called lower confidence countries, our findings can serve as baseline estimates assisting in conservation management activities.
In this case study, using the long-tailed macaque, a notable drawback is the output of an upper limit for the species' population constrained by benchmarking to regions with abundant data.This limitation forces the model's optimization for protected areas, originating from the KSWS and CTNP.These areas prohibit predatory hunting and have relatively effective regulations.The same does not hold true for the entire region to which the model was extrapolated, inevitably overestimating the number of individuals in these areas.Therefore, the numbers reported herein must be considered as an upper limit abundance estimation for the species in question, although groups sizes for provisioned populations are included.Moreover, the abundance estimation model strongly relies on averages across large regions scaled to match the true population inferred from direct observations.Hence, error cancellation is more pronounced in larger study areas.As a result, population estimates for larger countries (e.g., Vietnam) or regions (e.g., the Greater Mekong) are invariably more accurate than those for individual, smaller regions (e.g., Singapore).
In addition, our long-tailed macaque presence records, sourced from various channels including citizen science platforms, lack expert verification, posing a risk of misidentification, especially concerning hybrids with other macaque species occurring across some of the M. fascicularis range.The presence data also exhibited a lack of distribution, evident in the absence of observations in some inland areas.Hence, a comprehensive, long-term monitoring approach, considering factors influencing population dynamics, is essential for accurate models.Adopting such an approach is necessary for ensuring model accuracy by closely tracking changes in populations and factors influencing population dynamics.
In conclusion, in our model, we have identified a sustained decline in the population of long-tailed macaques, consistent with observed trends across recent decades.We believe that the population estimates produced here merit use in conservation measures as baseline abundance estimates (17,22).Population estimates for longtailed macaques have been requested by international monitoring and conservation agencies for decades.Our results are the first to enable comparison across countries and regions, highlighting their suitability for informing conservation action.Our best upper limit population estimates coupled with our habitat preference map can inform national and local conservation measures, CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora) management efforts, and IUCN guidelines moving forward.Last, our model, when adapted to other species, holds potential in contributing to biodiversity conservation using both current and previous data for various species in the years to come.

MATERIALS AND METHODS
The method has been fitted to the ODMAP (Overview, Data, Model, Assessment and Prediction) protocol database and is available in the Supplementary Materials, offering a breakdown of the model's key steps to ease understanding and reproducibility (39).

Data collection
Presence data of free-ranging long-tailed macaques were collected from multiple sources in Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam from January 2020 to December 2022 (table S1).An extensive survey took place in southeastern Bangladesh during the study period, yet no individuals were detected.Data sources included camera traps, line transect distance sampling sightings, and direct sightings, which represent 33.18, 1.19, and 65.63% of GPS location records, respectively.A total of 3077 location records were used in the habitat preference modeling, each record capturing one or more individual sightings.
Distribution and behavior of long-tailed macaques are influenced by multiple factors, such as topography, habitat type, anthropogenic disturbances, and food and water sources.These factors can be broadly categorized into four major classes: bioclimatic, geographical, land cover, and interference variables.Bioclimatic variables reflect seasonal variations in temperature and rainfall, as well as their potential impact on macaque behavior and food availability (40).Among the 19 climate factors in WorldClim, including 11 temperature-related and 8 precipitation-related factors, principal components analysis and correlation were used to screening to identify the six most important bioclimatic factors (figs.S4 and S5) (41).This approach aimed to avoid excessive similarity among factors that could lead to multicollinearity in the maximum entropy (MaxEnt) model.Geographical variables, including elevation, aspect, and slope, can indirectly influence the macroclimate, microclimate, and vegetation types (42).Slope and aspect were extracted using the digital elevation model data from NASA Earthdata and Japan Space Systems (43), and they were processed using ArcGIS Pro spatial analysis tools.The land cover variables encompassed built areas, crop lands, grasslands, shrublands, and forests, providing insights into the terrestrial vegetation types and anthropogenic modification.Considering M. fascicularis' frequent proximity to water resources and human habitats (including roads) (13,25,44), we included two interference variables: distance to the coastline and distance to roads.On the basis of the geological and ecological patterns in Southeast Asia and usages in previous research, this study selected a total of 16 environmental factors for the modeling of long-tailed macaques (Table 2).
The environmental data underwent a consistent projection process in both ArcGIS Pro and QGIS 3.32.0,ensuring uniform layer boundaries, coordinates, and grid sizes.Subsequently, all data files were converted into ASCII (ASC) format.These variables were used in the development of a habitat preference model for M. fascicularis across the regions of Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam.

Model summary
In general, the model works by replicating abundance estimates for group-living animals within small, mapped areas using a corrected first-principles probability approach.The correction incorporates experimental GPS signals (s) to establish a correlation with habitat preference (h) for estimating the upper limit of the population across larger areas.Experimental GPS signals (s) represent the reported number of individuals equal to or greater than 1 for all data points.The number of observed GPS signals (s) is related to a probability distribution for the number of individuals in the group responsible for these signals-that is, P{N = n|S = s}.These probability distributions are optimized by fitting a tunable variable (ρ) to experimental data.These corrected probability distributions generate an expectation value for the group size as a function of the number of GPS signals (<N|S = s>), namely These expectation values are assigned to fixed area spots according to their corresponding geographical coordinates (rectangular regions spanning 0.0131 degrees in latitude and longitude; 1.033 km by 1.107 km north-south and east-west, respectively, for a fixed area of 1.143 km 2 ).The habitat preference for these spots is derived from the habitat preference map as an input that approximates favored areas for a particular species based on environmental factors (altitude, forest density, and water proximity).This correlation between the expected number of individuals per spot and habitat preference is then extrapolated to larger areas, generating a habitat-inclusive estimation of the maximum number of individuals within a region.

MaxEnt modeling
We used a MaxEnt modeling methodology to map the habitat preference of M. fascicularis (45), considering the influence of a total of 16 environmental variables (Table 2).MaxEnt aims to estimate a target probability distribution by finding the probability distribution of MaxEnt, requiring only presence and environmental data (46).We supplied the presence of localities as GPS points of free-ranging long-tailed macaques in Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam.The geospatial coordinates, encompassing longitudes and latitudes, of all observed long-tailed macaque presence locations were transformed into CSV format.Simultaneously, environmental variable datasets were converted into ASC format.The dataset was divided, assigning 75% of the records for training and allocating the remaining 25% for model testing (45).We set the "apply threshold rule" to "10 percentile training presence" and retained default values for other parameters.The output format specified as "cloglog" provides probability estimates of presence ranging between 0 and 1.We used the AUC and the omission rate to test the accuracy of the model prediction.A larger AUC indicates better performance; typically, an AUC value between 0.7 and 1.0 suggests moderate to high performance (47).Meanwhile, we assessed the matching between the omission rate in training and test presence records and the predicted omission rate, where higher matching indicates more accurate model predictions (29).

Raw data of GPS signals and model inputs
The geographical GPS coordinates corresponding to long-tailed macaques' observations from camera traps, line transects, or reported sightings were grouped together according to home range: GPS signals were grouped when closer than 800 m = (2/π km 2 ) 1/2 = the radius of a 2-km 2 circle (approximate upper bound home range of a long-tailed macaque group) (48).
The semiempirical probability model includes eight speciesspecific input parameters (Table 3), one of which (ρ) encapsulates the uncertainty in individual long-tailed macaque behavior and is thus continuously varied to match experimental measurements.N min and N max define the bounds for the minimum and maximum number of individuals in a long-tailed macaque group, respectively.These were collected by combining a literature review with personal observations from the authors.Because of the distinctive synanthropic behavior of M. fascicularis, group size varies greatly between nonprovisioned and provisioned groups (49).Provisioned ratio captures this variation as the ratio of provisioned to nonprovisioned groups in the species.N mean [1] and N mean [2] denote the mean number of individuals in a nonprovisioned ( [1]) and provisioned ( [2]) group, respectively.σ sd [1] and σ sd [2] denote the SD of N mean [1] and , respectively.The variable ρ (termed as "inquisitiveness") represents the likelihood of individuals within the species being registered as a GPS signal more than once, resulting in double counting.It is intuitively proportional to the tendency of an individual to return to a GPS detection source and be observed more than once and inversely proportional to the actual group size.

Probability model
The probability matrix P{N = n|S = s} is obtained semiempirically, supplementing a first-principles derivation with experimental data on the species.
For a specific geographical location (spot), S = s is the number of GPS signals observed and N = n is the number of long-tailed macaque individuals in the group located at that specific spot.Assuming that all individuals in the group are equally likely to be detected in GPS signals, there are n s possible ways of obtaining S = s GPS signals.For instance, consider N = 7 individuals in a spot; the number of ways to generate S = 3 is then 7*7*7 = 73 = 273.Likewise, for N = 10 individuals, the number of ways is 10 3 = 1000.
Since group sizes are not limitless, the number of individuals is bound between N min and N max .Hence, in this simplified analysis, the normalized matrix becomes The lack of individual identification within most datasets might lead to double counting of individuals.To address this, repetitions of GPS signals are treated as "dead signals, " representing observations from previously identified individuals.Hence, the realistic number of GPS signals is obtained by omitting these repetitions.
Because individual identification remains unreliable for longtailed macaque camera trap data (among many other species) despite advances in artificial intelligence (50), we introduce an empirical exponential correction (ρ) to the number of GPS signals (s) so that σ = s (1−ρ) .It represents the number of GPS signals adjusted for double counting in each geographical spot.The continuous variable ρ ∈ [0, 1] is equivalent to an inquisitiveness factor due to its effect on the adjusted number of GPS signals (σ).When ρ = 0, individual long-tailed macaques are fully noninquisitive: Every reported GPS signal (in the same spot) is treated as a unique GPS signal from nonrepeated individuals (σ = s).When ρ = 1, long-tailed macaques are fully inquisitive: All reported GPS signals (in the same spot) are treated as repeated GPS signals from the same individual (σ = 1).Thus, the semiempirical probability matrix becomes The final empirical correction to the probability model comes from the base probability distribution for P{N = n}, known to be nonuniform.We account for the split in long-tailed macaque groups between nonprovisioned ( [1]) and provisioned ( [2]) environments by normalizing two united log-normal distributions (LNorm) (51).With this as the base correction to the group size distribution, the final probability matrix becomes where sd For a small number of GPS signals (s < 45), the probability of a small (nonprovisioned) group size, N ∈ {5, 20}, dominates the distribution (Fig. 3).As the number of GPS signals increases These probability distributions lead to a sinusoidal behavior in the expectation value of N for a given S, E N (s) = <N|S = s> (Fig. 4).
E N (s) is a strictly increasing function whose lower and upper bounds approach the mean group size, N mean , for nonprovisioned and provisioned groups, respectively.This relationship enables the determination of an expected number of individual long-tailed macaques for any given number of GPS signals.

Benchmarking against KSWS (Cambodia) and comparison with CTNP (Vietnam)
To ensure the accuracy of the probability model and optimize the empirical inquisitiveness factor ρ, the population estimate was benchmarked against published data for the KSWS (10).For this area, both camera trap data and a population range estimate were available.By monitoring 40 line transects from 2010 to 2020, the previous distance sampling (20) provides a best estimate from 2020 of 1566 individual long-tailed macaques in the region, with lower and upper bounds of 792 and 3097, respectively.Camera trap data for Keo Seima (obtained through the Wildlife Conservation Society, from 2020 to 2022) total 960 GPS signals spread across 82 nonoverlapping spots (coordinates separated by more than 800 m); an estimated population of 1566 individuals is obtained under an optimal ρ sanctuary = 0.523813.
For this ρ sanctuary , every distinct individual is responsible for approximately three camera trap signals (about two of every three GPS signals are considered to be repetitions).The upper bound of 3097 is appropriately replicated for ρ upper = 1 (for which every GPS signal from every spot is treated as representing a singular individual); the lower bound of 792 is closely matched, though slightly overestimated by 8%, for ρ lower = 0 (for which all GPS signals from the same spot are treated as repetitions).These three inquisitiveness factors (ρ lower , ρ sanctuary , and ρ upper , tailored for the protected area of a wildlife sanctuary) are used to generate three different E N (s) curves, similar to the one reported above.As a brief check of model accuracy, these three parameters were applied to observations from CTNP, yielding results in a reasonable order of magnitude.Table S2 summarizes these results for the Keo Seima and Cát Tiên protected areas.
It is important to note that this optimization relies on a protected area, in which individuals are supposedly shielded from human predatory pressures.As a result, this model-in this particular case study of M. fascicularis-implicitly treats all regions as protected areas.Since this cannot be true for all regions in all countries across Southeast Asia, the results of our model for this species offer a high upper limit for population abundance.Moreover, the calibration of the model assumes the previous assessment (20) to be accurate, implying that biases in the sampling method used invariably translate to the current exemplification of the model.
The greatest challenge in correlating habitat preference (h) to the expected number of individuals (E N = <N>) is the null result penalty: Locations where groups of a given species have not been observed go unreported by researchers and receive little to no further investigation.In the attempt to account for absence data and, more accurately, relate h to E N , a diffusion factor based on the cardinality of habitat preference values in the map and in the data signal spots was used.The raw diffusion factor quantifies the total number of pixel spots in the map that share habitat preference with the data signal (Fig. 5).Our analysis focuses on different regions: the entire habitat preference map, revealing a scarcity of pixels with a habitat preference greater than 0.8 and an abundance below 0.2; the KSWS, which contains mostly pixels with a habitat preference above 0.6, indicating the likely presence of long-tailed macaques in the area; and the CTNP, showing a more Gaussian spread in habitat preference values, making it appropriate for evaluating model consistency.
These raw diffusion factors (one for each value of h) are then subject to an average mean filter of varying thresholds (52).Namely,   This way, the denominator in Eq. 6 represents the abundance of long-tailed macaques in Keo Seima predicted from raw diffusion values alone.
To obtain the final habitat preference-population curves, the expected number of individuals for a pixel spot in the dataset [value obtained from the data signals and the curve for conditional expectation of N, E N (s), in Fig. 4] is scaled by the parametrized diffusion factor corresponding to its habitat preference; hence where Individuals h = Expected number of long-tailed macaque individuals at a pixel spot of habitat preference h.
To ensure continuity in the relationship between habitat preference and population estimation (as well as to incorporate habitat preference values not explicit in the dataset), a mean filter coupled with a univariate spline is applied to the resulting association.All possible mean filter thresholds (t ∈ {0, …, 684}) are tested, yielding a four-dimensional tensor (a matrix of matrices, in which two dimensions denote the two mean filter thresholds and two dimensions denote the relation between habitat preference and population).This final tensor is a matrix with 684 2 = 467,856 entries, each entry of which is its own habitat preference-population curve.The mean of the two first dimensions of this tensor is used to obtain a final curve.Given each value of ρ, this process results in three habitat preference-population curves, which can be applied to any region in the habitat preference map.

Fig. 1 .
Fig. 1.Habitat preference map for M. fascicularis in Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, and Vietnam.Red and blue colors indicate a high and low habitat preference, respectively, thus suggesting a greater and smaller likelihood of.M. fascicularis inhabiting the area.

Fig. 3 .
Fig. 3. Semiempirical probability distribution for a number of individuals given a number of GPS signals. in a protected wildlife sanctuary, ρ sanctuary = 0.523813.

7 )Fig. 6 .
Fig. 6.Selected mean filters for raw diffusion in habitat preference map according to habitat preference thresholds.

Table 1 . Estimate for upper limit long-tailed macaque populations in Southeast Asia, extrapolating behavior for the protected area of KSWS, ρ sanctuary = 0.523813. Location Best upper limit estimation Extremes of upper limit estimation
*A thorough investigation from 2021 to 2023 only revealed approximately 600 individuals in entire Laos (P.Phiapalath, personal observation).